University of Lethbridge’s Participation in DUC-2007 Main Task

نویسندگان

  • Yllias Chali
  • Shafiq R. Joty
چکیده

This paper describes a query-focused multi-document summarizer based on two distinct but complementary concepts: a) how much the sentence is related to the user query and b) how much the sentence is salient to the overall concept. Keeping these in focus we consider 6 important features: (1)Cosine Similarity (2)Lexical chain (3)BE overlaps (4)Question Focus overlap (5) Previous Sentence overlaps and (6)Document overlap. We consider Cosine Similarity measure, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences (Erkan and Radev, 2004). Lexical chains efficiently identify the theme of the document. An additional argument for the chain representation to consider as opposed to a simple word frequency model is the case when a single concept is represented by a number of words, each with relatively low frequency. Because the chain combines the number of occurrences of all its members, it can overcome the weight of the single word (Chali and Kolla, 2004). With BE 2 represented as a head-modifier-relation triple, one can quite easily decide whether any two units match (express the same meaning) or not–considerably more easily than with longer units (Hovy et al., 2005). We consider Question Focus Overlap feature to extract the sentences, which are relevant to the topic and narration. We consider the other two features: Previous Sentence overlaps and Document overlap in order to increase the coherence among the sentences in the summary.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The LIA summarization system at DUC-2007

This paper presents the LIA summarization systems participating to DUC 2007. This is the second participation of the LIA at DUC and we will discuss our systems in both main and update tasks. The system proposed for the main task is the combination of seven different sentence selection systems. The fusion of the system outputs is made with a weighted graph where the cost functions integrate the ...

متن کامل

The University of British Columbia at TAC 2008

In this paper we describe the University of British Columbia’s participation in the Text Analysis Conference 2008. This work represents our first submission to the DUC/TAC series of conferences, and we participated in both the summarization tasks: the main update task as well as the pilot task on summarizing blog opinions. We describe our systems in detail and describe our performance in the co...

متن کامل

Columbia University at DUC 2004

We describe our participation in tasks 2, 4 and 5 of the DUC 2004 evaluation. For each task, we present the system(s) used, focusing on novel and newly developed aspects. We also analyze the results of the human and automatic evaluations.

متن کامل

Generating Update Summaries for DUC 2007

Update summaries as defined for the new DUC 2007 task deliver focused information to a user who has already read a set of older documents covering the same topic. In this paper, we show how to generate this kind of summary from the same data structure—fuzzy coreference cluster graphs—as all other generic and focused multi-document summaries. Our system ERSS 2007 implementing this algorithm also...

متن کامل

Fudan University at DUC 2005

In this paper, we described our participation in the question-focused multi-document summarization task of DUC 2005. Our system was based on a supervised machine learning method in which feature extraction was an important issue. We present the whole procedure of our system, focusing on the features we used. We also analyze the results of manual and automatic evaluations.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007